AITopics | egocentric view

Collaborating Authors

egocentric view

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

abe8e03e3ac71c2ec3bfb0de042638d8-Supplemental.pdf

Neural Information Processing SystemsFeb-10-2026, 14:38:16 GMT

c-bet, egocentric view, panoramic view, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.99)

Add feedback

Did you just see that? Arbitrary view synthesis for egocentric replay of operating room workflows from ambient sensors

Zhang, Han, Seenivasan, Lalithkumar, Porras, Jose L., Soberanis-Mukul, Roger D., Ding, Hao, Shu, Hongchao, Killeen, Benjamin D., Ghosh, Ankita, Yarmus, Lonny, Ishii, Masaru, Argento, Angela Christine, Unberath, Mathias

arXiv.org Artificial IntelligenceOct-7-2025

Observing surgical practice has historically relied on fixed vantage points or recollections, leaving the egocentric visual perspectives that guide clinical decisions undocumented. Fixed-camera video can capture surgical workflows at the room-scale, but cannot reconstruct what each team member actually saw. Thus, these videos only provide limited insights into how decisions that affect surgical safety, training, and workflow optimization are made. Here we introduce EgoSurg, the first framework to reconstruct the dynamic, egocentric replays for any operating room (OR) staff directly from wall-mounted fixed-camera video, and thus, without intervention to clinical workflow. EgoSurg couples geometry-driven neural rendering with diffusion-based view enhancement, enabling high-visual fidelity synthesis of arbitrary and egocentric viewpoints at any moment. In evaluation across multi-site surgical cases and controlled studies, EgoSurg reconstructs person-specific visual fields and arbitrary viewpoints with high visual quality and fidelity. By transforming existing OR camera infrastructure into a navigable dynamic 3D record, EgoSurg establishes a new foundation for immersive surgical data science, enabling surgical practice to be visualized, experienced, and analyzed from every angle.

artificial intelligence, machine learning, workflow, (17 more...)

arXiv.org Artificial Intelligence

2510.04802

Country: North America (0.28)

Genre:

Research Report > Experimental Study (0.89)
Research Report > New Finding (0.68)

Industry:

Health & Medicine > Surgery (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)

Add feedback

A Supplemental Details

Neural Information Processing SystemsAug-16-2025, 17:16:54 GMT

Here, we formally define all intrinsic rewards evaluated in the paper. All algorithms use the same network architectures. Observations and changes counts are based on egocentric and panoramic views, respectively. MiniGrid, egocentric views are 147-dimensional while panoramic views are 588-dimensional. Gradients are clipped to have maximum norm 40.

c-bet, egocentric view, panoramic view, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.99)

Add feedback

EgoWorld: Translating Exocentric View to Egocentric View using Rich Exocentric Observations

Park, Junho, Ye, Andrew Sangwoo, Kwon, Taein

arXiv.org Artificial IntelligenceJun-24-2025

Egocentric vision is essential for both human and machine visual understanding, particularly in capturing the detailed hand-object interactions needed for manipulation tasks. Translating third-person views into first-person views significantly benefits augmented reality (AR), virtual reality (VR) and robotics applications. However, current exocentric-to-egocentric translation methods are limited by their dependence on 2D cues, synchronized multi-view settings, and unrealistic assumptions such as necessity of initial egocentric frame and relative camera poses during inference. To overcome these challenges, we introduce EgoWorld, a novel two-stage framework that reconstructs an egocentric view from rich exocentric observations, including projected point clouds, 3D hand poses, and textual descriptions. Our approach reconstructs a point cloud from estimated exocentric depth maps, reprojects it into the egocentric perspective, and then applies diffusion-based inpainting to produce dense, semantically coherent egocentric images. Evaluated on the H2O and TACO datasets, EgoWorld achieves state-of-the-art performance and demonstrates robust generalization to new objects, actions, scenes, and subjects. Moreover, EgoWorld shows promising results even on unlabeled real-world examples.

egow orld, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2506.17896

Genre: Research Report (0.82)

Industry: Education (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Human Computer Interaction > Interfaces > Virtual Reality (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(3 more...)

Add feedback

EgoChoir: Capturing 3D Human-Object Interaction Regions from Egocentric Views

Neural Information Processing SystemsMay-27-2025, 03:19:08 GMT

Understanding egocentric human-object interaction (HOI) is a fundamental aspect of human-centric perception, facilitating applications like AR/VR and embodied AI. For the egocentric HOI, in addition to perceiving semantics e.g., ''what'' interaction is occurring, capturing ''where'' the interaction specifically manifests in 3D space is also crucial, which links the perception and operation. Existing methods primarily leverage observations of HOI to capture interaction regions from an exocentric view. However, incomplete observations of interacting parties in the egocentric view introduce ambiguity between visual observations and interaction contents, impairing their efficacy. From the egocentric view, humans integrate the visual cortex, cerebellum, and brain to internalize their intentions and interaction concepts of objects, allowing for the pre-formulation of interactions and making behaviors even when interaction regions are out of sight.

artificial intelligence, egochoir, human-object interaction region, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Robots > Humanoid Robots (0.44)

Add feedback

Ego-to-Exo: Interfacing Third Person Visuals from Egocentric Views in Real-time for Improved ROV Teleoperation

Abdullah, Adnan, Chen, Ruo, Rekleitis, Ioannis, Islam, Md Jahidul

arXiv.org Artificial IntelligenceJun-30-2024

Underwater ROVs (Remotely Operated Vehicles) are unmanned submersible vehicles designed for exploring and operating in the depths of the ocean. Despite using high-end cameras, typical teleoperation engines based on first-person (egocentric) views limit a surface operator's ability to maneuver and navigate the ROV in complex deep-water missions. In this paper, we present an interactive teleoperation interface that (i) offers on-demand "third"-person (exocentric) visuals from past egocentric views, and (ii) facilitates enhanced peripheral information with augmented ROV pose in real-time. We achieve this by integrating a 3D geometry-based Ego-to-Exo view synthesis algorithm into a monocular SLAM system for accurate trajectory estimation. The proposed closed-form solution only uses past egocentric views from the ROV and a SLAM backbone for pose estimation, which makes it portable to existing ROV platforms. Unlike data-driven solutions, it is invariant to applications and waterbody-specific scenes. We validate the geometric accuracy of the proposed framework through extensive experiments of 2-DOF indoor navigation and 6-DOF underwater cave exploration in challenging low-light conditions. We demonstrate the benefits of dynamic Ego-to-Exo view generation and real-time pose rendering for remote ROV teleoperation by following navigation guides such as cavelines inside underwater caves. This new way of interactive ROV teleoperation opens up promising opportunities for future research in underwater telerobotics.

ego-to-exo, egocentric view, improved rov teleoperation, (1 more...)

arXiv.org Artificial Intelligence

2407.00848

Genre: Research Report (0.40)

Industry: Electrical Industrial Apparatus (1.00)

Technology:

Information Technology > Architecture > Real Time Systems (0.80)
Information Technology > Artificial Intelligence > Robots (0.53)

Add feedback

EgoGen: An Egocentric Synthetic Data Generator

Li, Gen, Zhao, Kaifeng, Zhang, Siwei, Lyu, Xiaozhong, Dusmanu, Mihai, Zhang, Yan, Pollefeys, Marc, Tang, Siyu

arXiv.org Artificial IntelligenceJan-16-2024

Understanding the world in first-person view is fundamental in Augmented Reality (AR). This immersive perspective brings dramatic visual changes and unique challenges compared to third-person views. Synthetic data has empowered third-person-view vision models, but its application to embodied egocentric perception tasks remains largely unexplored. A critical challenge lies in simulating natural human movements and behaviors that effectively steer the embodied cameras to capture a faithful egocentric representation of the 3D world. To address this challenge, we introduce EgoGen, a new synthetic data generator that can produce accurate and rich ground-truth training data for egocentric perception tasks. At the heart of EgoGen is a novel human motion synthesis model that directly leverages egocentric visual inputs of a virtual human to sense the 3D environment. Combined with collision-avoiding motion primitives and a two-stage reinforcement learning approach, our motion synthesis model offers a closed-loop solution where the embodied perception and movement of the virtual human are seamlessly coupled. Compared to previous works, our model eliminates the need for a pre-defined global path, and is directly applicable to dynamic environments. Combined with our easy-to-use and scalable data generation pipeline, we demonstrate EgoGen's efficacy in three tasks: mapping and localization for head-mounted cameras, egocentric camera tracking, and human mesh recovery from egocentric views. EgoGen will be fully open-sourced, offering a practical solution for creating realistic egocentric training data and aiming to serve as a useful tool for egocentric computer vision research. Refer to our project page: https://ego-gen.github.io/.

computer vision, dataset, egogen, (16 more...)

arXiv.org Artificial Intelligence

2401.08739

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)
(12 more...)

Genre: Research Report (0.81)

Industry:

Information Technology (0.46)
Energy > Renewable > Geothermal (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

How to Build a Real-time Hand-Detector using Neural Networks (SSD) on Tensorflow

@machinelearnbotDec-12-2017, 10:32:00 GMT

For example, these algorithms might get confused if the background is unusual or where sharp changes in lighting conditions cause sharp changes in skin color or the tracked object becomes occluded.

artificial intelligence, dataset, machine learning, (12 more...)

@machinelearnbot

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.96)

Add feedback